%load_ext autoreload
%autoreload 2
import scdiff_init
import os
import subprocess
import CSHMM_analysis
%matplotlib inline
Model initialization with scdiff: http://www.cs.cmu.edu/~jund/scdiff/
data_file="treutlein2014"
scdiff_init.run_scdiff_init(data_file)
Train CSHMM model and select best iteration with cross validation, it will take some time. We set the sample points to 10 instead of 100 in the paper and number of genes to 3000 to save time here. The cross_validation setting can be turned on and the number of iteration can also be adjusted.
cmd = "python CSHMM_train.py \
--data_file treutlein2014 \
--structure_file init_cluster_treutlein2014.txt \
--n_split 100 -ng 16000 \
--n_iteration 2 \
--cross_validation 0 \
--random_seed 0 \
--model_name lung_developmental"
result_stdout = subprocess.check_output(cmd.split(" "))
result_stdout
print result_stdout
print model structure and cell assignment
model_file = "lung_developmental_it2.pickle"
#import CSHMM_train as ML
CSHMM_analysis.plot_path_fig(model_file,data_file,circle_size=20)
Load marker genes for lung and neuron dataset Note that you can change the init_marker_gene_dict function in CSHMM_analysis to your own set of markers
CSHMM_analysis.marker_gene_list,CSHMM_analysis.marker_genes_dict,CSHMM_analysis.treutlein2014_mkgene,CSHMM_analysis.treutlein2014_mkgene_dict,CSHMM_analysis.treutlein2016_mkgene,CSHMM_analysis.treutlein2016_mkgene_dict=CSHMM_analysis.init_marker_gene_dict()
Generate GO analysis files for each path
model_ana_temp=CSHMM_analysis.analyze_gene(model_file,data_file=data_file,out_folder = "lung_results")
Analyze the pair of paths to show differential expression genes and do GO analysis
model_ana_temp = CSHMM_analysis.analyze_path_difference(model_ana_temp,('AT1',(7,4)),('AT2',(5,4)),"")
model_ana_temp = CSHMM_analysis.analyze_path_difference(model_ana_temp,('ciliated',tuple([2,0])),('Clara',tuple([3,0])),"")
Plot continuous gene expression
CSHMM_analysis.plot_cont_marker_gexp(model_ana_temp,remove_path=tuple([6,1,0]))